On the Integration of Structured Data and Text: A Review of the SIRE Architecture (invited talk)

نویسنده

  • Ophir Frieder
چکیده

1.0 Introduction Over the past decade, members of the Information Retrieval Lab have designed, developed, and deployed a variety of information retrieval systems. A central theme for all of our systems was the integration of structured data and text. One of our more recent efforts, SIRE, a Scalable Information Retrieval Engine [Grossman97, Grossman98, Lundquist99] is the focus of this paper. For completeness, I review some of the functionality of SIRE although it is described, in greater detail, in other forums. We describe the architecture of the prototype developed for the National Institutes of Health (NIH) National Center for Complementary and Alternative Medicine (NCCAM) [Frieder00] by some of the members of the laboratory. The version deployed at NCCAM is a more industrialized version of this prototype. The mainstream approach in the development of information retrieval systems uses a customized inverted index to represent the text. SIRE, on the other hand, is a relational information retrieval approach and uses relations to model an inverted index. Storing the full text in a relational environment integrates the search of unstructured data with the traditional structured data search of Relational Database Management Systems (RDBMS). By using only standard SQL, SIRE leverages investment of the commercial relational database industry, while providing all capabilities of a more traditional information retrieval approach. RDBMS offer a wide variety of functionality such as concurrency control, recovery, security, portability, scalability, and robustness. RDBMS vendors continuously improve these features and incorporate advances made in hardware and software. Thus, an application using an RDBMS is able to keep up with the technology curve with less investment than a custom solution. SIRE is implemented strictly as a relational database application. Key information retrieval techniques such as leading similarity measures, proximity searching, n-grams, passages, phrase indexing, and relevance feedback are all implemented using standard SQL. By adhering to strictly standard SQL, SIRE is completely portable across platforms and database management systems. Thus far, either laboratory members or our industrial collaborators or sponsors implemented SIRE on the NCR DBC-1012, Microsoft SQL Server, Sybase, Oracle, IBM DB2 and SQLD/S database management systems. SIRE achieves good performance and scalability and is currently in production use in a variety of text search applications both commercially as part of multiple industrial efforts and in various government laboratories including the National Institutes of Health National Center for Complementary and Alternative Medicine. The relational platform offers equivalent capabilities to traditional Information Retrieval (IR) systems while providing a parallel, scalable, maintainable architecture with a unified platform for integrating searches of structured and unstructured data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structured Network Public Spaces a Step Toward Integration of Urban

Network of public spaces composes of a network of interconnected land use and various elements of the city, such as synthetic and natural which shows the city as a whole. Network structure of public spaces is important because understanding this network as a structure presents us the formation of the city. This paper attempts to define the status of the network of public spaces in the city stru...

متن کامل

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

A Systematic Review of Scientific Products Indexed at the Scopus Database in the Field of Post-disaster Housing with a Focus on Architecture

 Post disaster temporary housing is one of the challenges of disaster preparedness in any country; because there is a basic need for sustainable, affordable and efficient temporary housing. This study aimed to evaluate the scientific products in post-disaster temporary housing, focusing on the field of architecture and using scientometric methods and content analysis, systematic review and co-o...

متن کامل

Prospect 1 and Four Corners 1 in the Spotlight: Textbook Evaluation with Some Reference to Critical Discourse Analysis

As an analytical type of approach, Critical Discourse Analysis (CDA) deals with the emphasis on social practice, identity, power, and ideology built through text and speech in socio-political and educational contexts. Having proposed a theoretical framework, it uncovered all discrepant ways through which power and societal practices are produced in written and spoken texts. Moreover, mingled wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000